skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Guo, Hongyu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks. 
    more » « less
    Free, publicly-accessible full text available March 27, 2026
  2. Abstract. Field-deployable real-time aerosol mass spectrometers (AMSs) typically use an aerodynamic lens as an inlet that collimates aerosols into a narrow beam over a wide range of particle sizes. Such lenses need constant upstream pressure to work consistently. Deployments in environments where the ambient pressure changes, e.g., on aircraft, typically use pressure-controlled inlets (PCIs). These have performed less well for supermicron aerosols, such as the larger particles in stratospheric air and some urban hazes. In this study, we developed and characterized a new PCI design (“CU PCI-D”) coupled with a recently developed PM2.5 aerodynamic lens, with the goal of sampling the full accumulation mode of ambient aerosols with minimal losses up to upper troposphere and lower stratosphere (UTLS) altitudes. A new computer-controlled lens alignment system and a new 2D particle beam imaging device that improves upon the Aerodyne aerosol beam width probe (BWP) have been developed and tested. These techniques allow for fast automated aerosol beam width and position measurements and ensure the aerodynamic lens is properly aligned and characterized for accurate quantification, in particular for small sizes that are hard to access with monodisperse measurements. The automated lens alignment tool also allows position-dependent thermal decomposition to be investigated on the vaporizer surface. The CU PCI-D was tested on the TI3GER campaign aboard the NCAR/NSF G-V aircraft. Based on comparisons with the co-sampling UHSAS particle sizer, the CU aircraft AMS with the modified PCI consistently measured ∼ 89 % of the accumulation-mode particle mass in the UTLS. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  3. Active exoskeletons are emerging as ergonomic solutions in the construction sector to reduce work-related musculoskeletal injuries. While the benefits of active exoskeletons are promising, they can also cause increased muscle activity, leading to local muscular fatigue. This study aimed to examine the impact of the active exoskeleton system on the muscular activity of construction workers during common construction activities. Ten subjects completed material handling tasks under two weight conditions (10 and 30 lbs) in a lab-controlled environment, with and without using an active exoskeleton. Portable electromyography (EMG) sensors were used to measure lumbar erector spinae (LES) muscle activity in each condition. Four descriptive statistics features in the time and frequency domains were extracted from the collected signals. Results of the t-test showed a significant difference in the physiological metrics extracted from the subjects’ EMG signals of the LES muscle. Findings demonstrated that using active exoskeletons reduces the internal muscle force in the lower back regions of construction workers. 
    more » « less
  4. null (Ed.)
    Abstract On single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods. 
    more » « less
  5. Abstract MotivationMHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for MHC Class I proteins, in vitro experiments have been conducted to screen peptides with high binding affinities to hundreds of given MHC Class I proteins. However, considering tens of thousands of known MHC Class I proteins, conducting in vitro experiments for extensive MHC proteins is infeasible, and thus a more efficient and scalable way to characterize binding motifs is needed. ResultsWe presented a de novo generation framework, coined PepPPO, to characterize binding motif for any given MHC Class I proteins via generating repertoires of peptides presented by them. PepPPO leverages a reinforcement learning agent with a mutation policy to mutate random input peptides into positive presented ones. Using PepPPO, we characterized binding motifs for around 10 000 known human MHC Class I proteins with and without experimental data. These computed motifs demonstrated high similarities with those derived from experimental data. In addition, we found that the motifs could be used for the rapid screening of neoantigens at a much lower time cost than previous deep-learning methods. Availability and implementationThe software can be found in https://github.com/minrq/pMHC. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  6. Abstract Quantifying ecosystem resilience to disturbance is important for understanding the effects of disturbances on ecosystems, especially in an era of rapid global change. However, there are few studies that have used standardized experimental disturbances to compare resilience patterns across abiotic gradients in real‐world ecosystems. Theoretical studies have suggested that increased return times are associated with increasing variance during recovery from disturbance. However, this notion has rarely been explicitly tested in field, in part due to the challenges involved in obtaining long‐term experimental data. In this study, we examined resilience to disturbance of 12 coastal marsh sites (five low‐salinity and seven polyhaline [=salt] marshes) along a salinity gradient in Georgia, USA. We found that recovery times after experimental disturbance ranged from 7 to >127 months, and differed among response variables (vegetation height, cover and composition). Recovery rates decreased along the stress gradient of increasing salinity, presumably due to stress reducing plant vigor, but only when low‐salinity and polyhaline sites were analyzed separately, indicating a strong role for traits of dominant plant species. The coefficient of variation of vegetation cover and height in control plots did not vary with salinity. In disturbed plots, however, the coefficient of variation (CV) was consistently elevated during the recovery period and increased with salinity. Moreover, higher CV values during recovery were correlated with slower recovery rates. Our results deepen our understanding of resilience to disturbance in natural ecosystems, and point to novel ways that variance can be used either to infer recent disturbance, or, if measured in areas with a known disturbance history, to predict recovery patterns. 
    more » « less